自动检测视网膜结构,例如视网膜血管(RV),凹起的血管区(FAZ)和视网膜血管连接(RVJ),对于了解眼睛的疾病和临床决策非常重要。在本文中,我们提出了一种新型的基于投票的自适应特征融合多任务网络(VAFF-NET),用于在光学相干性层析成像(OCTA)中对RV,FAZ和RVJ进行联合分割,检测和分类。提出了一个特定于任务的投票门模块,以适应并融合两个级别的特定任务的不同功能:来自单个编码器的不同空间位置的特征,以及来自多个编码器的功能。特别是,由于八八座图像中微脉管系统的复杂性使视网膜血管连接连接到分叉/跨越具有挑战性的任务的同时定位和分类,因此我们通过结合热图回归和网格分类来专门设计任务头。我们利用来自各种视网膜层的三个不同的\ textit {en face}血管造影,而不是遵循仅使用单个\ textit {en face}的现有方法。为了促进进一步的研究,已经发布了这些数据集的部分数据集,并已发布了公共访问:https://github.com/imed-lab/vaff-net。
translated by 谷歌翻译
边缘计算广泛用于视频分析。为了减轻准确性和成本之间的固有张力,已经提出了各种视频分析管道,以优化GPU在边缘节点上的使用。但是,我们发现,由于视频内容的变化,在管道的不同位置的视频内容变化,亚次采样和过滤,因此为边缘节点提供的GPU计算资源通常被低估了。与模型和管道优化相反,在这项工作中,我们使用非确定性和分散的闲置GPU资源研究了机会数据增强的问题。具体而言,我们提出了一个特定于任务的歧视和增强模块以及一种模型感知的对抗性训练机制,提供了一种以准确有效的方式识别和转换特定于视频管道的低质量图像的方法。在延迟和GPU资源限制下,进一步开发了多个EXIT模型结构和资源感知调度程序,以做出在线增强决策和细粒度的执行。多个视频分析管道和数据集的实验表明,通过明智地分配少量的空闲资源,这些框架上倾向于通过增强而产生更大的边际收益,我们的系统将DNN对象检测准确性提高了7.3-11.3 \%,而不会产生任何潜行成本。
translated by 谷歌翻译
在发展强化学习(RL)培训系统方面取得了重大进展。过去的作品,例如Impala,Apex,Seed RL,样本工厂等,旨在改善系统的整体吞吐量。在本文中,我们试图解决RL训练系统中的常见瓶颈,即平行环境执行,这通常是整个系统中最慢的部分,但很少受到关注。通过针对RL环境的策划设计,我们改善了不同硬件设置的RL环境模拟速度,从笔记本电脑和适度的工作站到NVIDIA DGX-A100等高端机器。在高端机器上,Envpool在Atari环境上的环境执行每秒可实现100万帧,在Mujoco环境上每秒执行300万帧。在笔记本电脑上运行时,Envpool的速度是Python子过程的2.8倍。此外,在开源社区中已经证明了与现有RL培训库的极大兼容性,包括Cleanrl,RL_Games,DeepMind Acme等。最后,Envpool允许研究人员以更快的速度迭代他们的想法,并具有巨大的潜力,并具有巨大的潜力事实上的RL环境执行引擎。示例运行表明,在笔记本电脑上训练Atari Pong和Mujoco Ant只需5分钟即可。 Envpool已经在https://github.com/sail-sg/envpool上开源。
translated by 谷歌翻译
Hierarchical text classification aims to leverage label hierarchy in multi-label text classification. Existing methods encode label hierarchy in a global view, where label hierarchy is treated as the static hierarchical structure containing all labels. Since global hierarchy is static and irrelevant to text samples, it makes these methods hard to exploit hierarchical information. Contrary to global hierarchy, local hierarchy as a structured labels hierarchy corresponding to each text sample. It is dynamic and relevant to text samples, which is ignored in previous methods. To exploit global and local hierarchies,we propose Hierarchy-guided BERT with Global and Local hierarchies (HBGL), which utilizes the large-scale parameters and prior language knowledge of BERT to model both global and local hierarchies.Moreover,HBGL avoids the intentional fusion of semantic and hierarchical modules by directly modeling semantic and hierarchical information with BERT.Compared with the state-of-the-art method HGCLR,our method achieves significant improvement on three benchmark datasets.
translated by 谷歌翻译
在以前的作品中广泛讨论了句子语义相似性的原始伯特的表现不佳。我们发现不满意的性能主要是由于静态令牌嵌入偏差和无效的伯特层,而不是姓氏的高余弦相似性。为此,我们提出了一个迅速的句子嵌入方法,可以减少令牌嵌入偏差,使原始伯特层更有效。通过将句子嵌入式任务重新塑造为填充空白问题,我们的方法显着提高了原始伯特的性能。我们讨论了两个提示符,表示基于及时的句子嵌入的三个提示搜索方法。此外,我们提出了一种通过模板去噪技术的新型无监督培训目标,这大大缩短了监督和无人监督的环境之间的性能差距。对于实验,我们评估我们在非微调和微调的设置上的方法。即使是非微调方法也可以优于STS任务上的无监督服务器等微调的方法。我们的微调方法在无监督和监督设置中优于最先进的方法SIMCSE。与SIMCSE相比,我们分别在无监督环境下实现了2.29和2.58点的伯特和罗伯塔的改进。
translated by 谷歌翻译
端到端模型正在成为误用检测和诊断(MDD)的流行方法。许多实际应用要求的流MDD框架仍然是一个挑战。本文提出了一种名为CCA-MDD的流端到端MDD框架。CCA-MDD支持在线处理,并且能够实时运行。CCA-MDD的编码器包括基于Conv变压器网络的流式声学编码器,并改善了命名的耦合横向(CCA)的改进的横向关注。耦合的横向于预先编码的语言特征集成了编码的声学特征。应用从多任务学习培训的解码器的集合用于最终MDD决策。公开的Corpora实验表明,CCA-MDD可实现可比性的性能,以发布离线端到端MDD模型。
translated by 谷歌翻译
对比学习方法在学习视觉表现方面取得了巨大成功,目标课程少数标签很少。这意味着诱使将它们缩放超出策划的“种子”基准,从互联网级外部源结合更多未标记的图像以提高其性能。然而,在实践中,由于所需的型号和更长的培训,更大的未标记数据将需要更多的计算资源。此外,开放世界未标记的数据通常遵循隐式的长尾类或属性分布,其中许多也不属于目标类。盲目利用所有未标记的数据,因此可以导致数据不平衡以及分散化问题。这使我们能够寻求原则性的方法来战略性地从外部来源选择未标记的数据,以便学习相关课程的可概括,平衡和多样化的陈述。在这项工作中,我们介绍了一个名为Model-Aware K-Center(MAK)的开放式未标记的数据采样框架,其遵循三个简单的原则:(1)尾巴,这鼓励通过对实证对比进行尾舱来抽样。随机数据增强的样本的损失预期(ECLE); (2)靠近,拒绝分配可能分散训练的分配异常值; (3)多样性,可确保采样例集中的多样性。经验,使用ImageNet-100-LT(没有标签)作为种子数据集和两个“嘈杂”的外部数据源,我们证明MAK可以一致地提高学习功能的总体表示质量和阶级平衡,如通过线性评估的全拍和少量设置的分类器评估。代码可用:\ url {https://github.com/vita-group/mak
translated by 谷歌翻译
We present CodeBERT, a bimodal pre-trained model for programming language (PL) and natural language (NL). CodeBERT learns general-purpose representations that support downstream NL-PL applications such as natural language code search, code documentation generation, etc. We develop Code-BERT with Transformer-based neural architecture, and train it with a hybrid objective function that incorporates the pre-training task of replaced token detection, which is to detect plausible alternatives sampled from generators. This enables us to utilize both "bimodal" data of NL-PL pairs and "unimodal" data, where the former provides input tokens for model training while the latter helps to learn better generators. We evaluate CodeBERT on two NL-PL applications by fine-tuning model parameters. Results show that CodeBERT achieves state-of-the-art performance on both natural language code search and code documentation generation. Furthermore, to investigate what type of knowledge is learned in CodeBERT, we construct a dataset for NL-PL probing, and evaluate in a zero-shot setting where parameters of pre-trained models are fixed. Results show that CodeBERT performs better than previous pre-trained models on NL-PL probing. 1
translated by 谷歌翻译
This paper focuses on designing efficient models with low parameters and FLOPs for dense predictions. Even though CNN-based lightweight methods have achieved stunning results after years of research, trading-off model accuracy and constrained resources still need further improvements. This work rethinks the essential unity of efficient Inverted Residual Block in MobileNetv2 and effective Transformer in ViT, inductively abstracting a general concept of Meta-Mobile Block, and we argue that the specific instantiation is very important to model performance though sharing the same framework. Motivated by this phenomenon, we deduce a simple yet efficient modern \textbf{I}nverted \textbf{R}esidual \textbf{M}obile \textbf{B}lock (iRMB) for mobile applications, which absorbs CNN-like efficiency to model short-distance dependency and Transformer-like dynamic modeling capability to learn long-distance interactions. Furthermore, we design a ResNet-like 4-phase \textbf{E}fficient \textbf{MO}del (EMO) based only on a series of iRMBs for dense applications. Massive experiments on ImageNet-1K, COCO2017, and ADE20K benchmarks demonstrate the superiority of our EMO over state-of-the-art methods, \eg, our EMO-1M/2M/5M achieve 71.5, 75.1, and 78.4 Top-1 that surpass \textbf{SoTA} CNN-/Transformer-based models, while trading-off the model accuracy and efficiency well.
translated by 谷歌翻译
We aim to bridge the gap between our common-sense few-sample human learning and large-data machine learning. We derive a theory of human-like few-shot learning from von-Neuman-Landauer's principle. modelling human learning is difficult as how people learn varies from one to another. Under commonly accepted definitions, we prove that all human or animal few-shot learning, and major models including Free Energy Principle and Bayesian Program Learning that model such learning, approximate our theory, under Church-Turing thesis. We find that deep generative model like variational autoencoder (VAE) can be used to approximate our theory and perform significantly better than baseline models including deep neural networks, for image recognition, low resource language processing, and character recognition.
translated by 谷歌翻译